Loops in python

What do you do if you have to do loops in python?
For example: i have a file with 2000 lines, and a DB of 6000 lines
For each line in the file i have to check if any of the 6000 elements is in the line.

I currently have a solution in c# and java but im curious how would you go about it in python since the loops are very slow.

The Kind of Tired That Sleep Won’t Fix Shirt $21.68

Tip Your Landlord Shirt $21.68

The Kind of Tired That Sleep Won’t Fix Shirt $21.68

  1. 2 weeks ago
    Anonymous

    >Python
    Please, leave ...
    ... just be gone

  2. 2 weeks ago
    Anonymous

    One loop to create an array of 2000 elements
    One database query to check if a row exists with any such element
    Maybe have to chunk it a bit in case 2000 is too many for an IN() clause.

    • 2 weeks ago
      Anonymous

      And besides, why in the world would you loop through records and execute a query per record in a loop? Are you insane?

      i dont get what you mean?

      The problem is simple.
      You have a file and a DB of names or just "strings" that you have to find if any of that is in each line. Is my approach moronic? 2 for loops and then a loop over the string

      >Python
      Please, leave ...
      ... just be gone

      it's what the industry uses homosexual

  3. 2 weeks ago
    Anonymous

    And besides, why in the world would you loop through records and execute a query per record in a loop? Are you insane?

    • 2 weeks ago
      Anonymous

      One loop to create an array of 2000 elements
      One database query to check if a row exists with any such element
      Maybe have to chunk it a bit in case 2000 is too many for an IN() clause.

      def check_substrings(file_path, db_path):
      # Read lines from db.txt
      with open(db_path, 'r') as db_file:
      db_lines = db_file.readlines()

      # Read lines from file.txt
      with open(file_path, 'r') as file:
      file_lines = file.readlines()

      # Check if any line from db.txt is a substring of any line in file.txt
      for db_line_num, db_line in enumerate(db_lines, 1):
      for file_line_num, file_line in enumerate(file_lines, 1):
      if db_line.strip().upper() in file_line.upper():
      print(f"Line {db_line_num} in {db_path} matches a substring in line {file_line_num} in {file_path}")

      generated by chatgpt, i'm asking if this is too slow and if it can be made faster

      • 2 weeks ago
        Anonymous

        no it's perfect
        in fact, ChatGPT is so good you should use it instead of killing a thread for this shit moronic homosexual

        • 2 weeks ago
          Anonymous

          In Python, if you're using a for loop it almost always means you're doing something wrong. In this case there is a trivial solution that will also be orders of magnitude faster: convert both lists to a set, then take the intersection.

          Put all the 6000 elements in a map
          For every line in the 2000 line, see if the element is already there
          Foe bigger sets I dont really know

          I imagine the fastest solution that could reasonably implemented while preserving the features of this current program (such as keeping track of line numbers) would be to convert file_line to a set, then use a list comprehension on db_line. Something like this:

          file_set = file_line.set()
          [ print("some_bullshit") for (line_num, line) in enumerate(db_line, 1) if line in file_set ]

          Haven't tested it so you might need to make some syntax corrections, my Python is a little rusty. But I really doubt you'll find a faster solution than this while keeping things so simple.

          Benefits:
          - sets offer average lookup time of O(1)
          - list comprehensions use optimised C code so they're much faster than for loops
          - entire algorithm is O(n), compared to the O(n^2) algorithm you posted

          You're welcome.

          class TrieNode:
          def __init__(self):
          self.children = {}
          self.is_end_of_word = False

          class Trie:
          def __init__(self):
          self.root = TrieNode()

          def insert(self, word):
          node = self.root
          for char in word:
          if char not in node.children:
          node.children[char] = TrieNode()
          node = node.children[char]
          node.is_end_of_word = True

          def search(self, word):
          node = self.root
          matched_string = ""
          for i, char in enumerate(word):
          if char not in node.children:
          break
          node = node.children[char]
          matched_string += char
          if node.is_end_of_word:
          return matched_string
          return None

          def build_trie(db_lines):
          trie = Trie()
          for line in db_lines:
          trie.insert(line)
          return trie

          def check_substrings(file_path, db_path):
          # Read lines from db.txt and convert to a list with .strip().upper()
          with open(db_path, 'r') as db_file:
          db_lines = [line.strip().upper() for line in db_file]

          # Build a trie from db lines
          trie = build_trie(db_lines)

          # Read lines from file.txt
          with open(file_path, 'r') as file:
          for file_line_num, file_line in enumerate(file, 1):
          file_line = file_line.strip().upper()
          current_index = 0
          while current_index < len(file_line):
          matched_string = trie.search(file_line[current_index:])
          if matched_string:
          print(f"Found a match '{matched_string}' in line {file_line_num}")
          break
          current_index += 1

          this is the code that chatgpt came up with and it dropped the runtime from 8 seconds to 1 second

          • 2 weeks ago
            Anonymous

            Test my solution here:

            I imagine the fastest solution that could reasonably implemented while preserving the features of this current program (such as keeping track of line numbers) would be to convert file_line to a set, then use a list comprehension on db_line. Something like this:

            file_set = file_line.set()
            [ print("some_bullshit") for (line_num, line) in enumerate(db_line, 1) if line in file_set ]

            Haven't tested it so you might need to make some syntax corrections, my Python is a little rusty. But I really doubt you'll find a faster solution than this while keeping things so simple.

            Benefits:
            - sets offer average lookup time of O(1)
            - list comprehensions use optimised C code so they're much faster than for loops
            - entire algorithm is O(n), compared to the O(n^2) algorithm you posted

            You're welcome.

            then let me know the runtime.

            And for god's sake, stop polluting IQfy with your AI slop.

          • 2 weeks ago
            Anonymous

            for the extreme case, 22 seconds to 10seconds with your thing
            For smaller cases, difference of 0.04(trie) 0.12(yours)

            def check_substrings(file_path, db_path):
            # Read lines from db.txt
            with open(db_path, 'r') as db_file:
            db_lines = [line.strip().upper() for line in db_file]

            # Read lines from file.txt and convert to a set of uppercase lines
            with open(file_path, 'r') as file:
            file_lines_set = {line.strip().upper() for line in file}

            # Check if any line from db.txt is a substring of any line in file.txt
            matches = [(db_line_num, db_line, file_line_num)
            for db_line_num, db_line in enumerate(db_lines, 1)
            for file_line_num, file_line in enumerate(file_lines_set, 1)
            if db_line in file_line]

            # Print matches
            for db_line_num, db_line, file_line_num in matches:
            print(f"Line {db_line_num} in {db_path} matches a substring in line {file_line_num} {db_lines[db_line_num-1]}")

          • 2 weeks ago
            Anonymous

            You did it wrong. You must use the "in" operation on the SET, not the LIST. Sets have O(1) lookup time, lists have O(n). Fricking dummy.

            I'm leaving this thread, have fun with your AI slop.

          • 2 weeks ago
            Anonymous

            # Read lines from db.txt
            with open("db.txt", 'r') as db_file:
            db_lines = db_file.readlines()

            # Read lines from file.txt
            with open("file.txt", 'r') as file:
            file_lines = file.readlines()

            file_set = set(file_lines)
            [ print(f"{line_num}: {line}") for line_num, line in enumerate(db_lines, 1) if line in file_set ]

            Here you go Rajesh, since you're clearly a complete fricking moron I did all the work for you. Just run this script and time it.

          • 2 weeks ago
            Anonymous

            still doesnt work pajeet, it doesnt match anything
            i dont mean to be toxic but the version generated worked meanwhile yours doesnt

          • 2 weeks ago
            Anonymous

            I just tested my version and it does work, I tested it on
            for (( i = 0; i < 5000; i++ )); do echo "$(head -c 300 /dev/urandom | tr -dc 'a-z0-9 ')" >> db.txt; done
            for (( i = 0; i < 5000; i++ )); do echo "$(head -c 300 /dev/urandom | tr -dc 'a-z0-9 ')" >> file.txt; done
            cat file.txt >> db.txt
            shuf -o db.txt db.txt
            time python3 script.py

            real 0m0.110s
            user 0m0.070s
            sys 0m0.041s

            Meanwhile your first solution took 18 seconds, so mine is about 180x faster. I don't know how you managed to break my code when I literally gave you a complete script to run, but you clearly have a talent for being stupid.

          • 2 weeks ago
            Anonymous

            [ print(f"{line_num}: {line}") for line_num, line in enumerate(db_lines, 1) if line in file_set ]

            this checks if "X" in file_set, if my file_set contains "homosexuals X homosexuals" it will not match, meanwhile this
            matches = [(db_line_num, db_line, file_line_num)
            for db_line_num, db_line in enumerate(db_lines, 1)
            for file_line_num, file_line in enumerate(file_lines_set, 1)
            if db_line in file_line]

            does, why you have to be so toxic if you didnt understand what i asked

          • 2 weeks ago
            Anonymous

            I don't even understand what the frick you're trying to say. You are literally a fricking Pajeet.

          • 2 weeks ago
            Anonymous

            file_set1 --> "xxxhomosexualsxxx"
            file_set2 --> "homosexual"

            your code will match file_set2 but not file_set1 give the item "homosexual", what don't you understand? and stop calling me named

          • 2 weeks ago
            Anonymous

            sorry, from 22 seconds with the first implementation to 1 second with the second one

      • 2 weeks ago
        Anonymous

        for a file with 10k lines and a db with 2.4k lines
        it takes 7 seconds to do it with the code i provided

        no it's perfect
        in fact, chatgpt is so good you should use it instead of killing a thread for this shit moronic homosexual

        it's a sunny day, why are you seething?

      • 2 weeks ago
        Anonymous

        In Python, if you're using a for loop it almost always means you're doing something wrong. In this case there is a trivial solution that will also be orders of magnitude faster: convert both lists to a set, then take the intersection.

      • 2 weeks ago
        Anonymous

        I imagine the fastest solution that could reasonably implemented while preserving the features of this current program (such as keeping track of line numbers) would be to convert file_line to a set, then use a list comprehension on db_line. Something like this:

        file_set = file_line.set()
        [ print("some_bullshit") for (line_num, line) in enumerate(db_line, 1) if line in file_set ]

        Haven't tested it so you might need to make some syntax corrections, my Python is a little rusty. But I really doubt you'll find a faster solution than this while keeping things so simple.

        Benefits:
        - sets offer average lookup time of O(1)
        - list comprehensions use optimised C code so they're much faster than for loops
        - entire algorithm is O(n), compared to the O(n^2) algorithm you posted

        You're welcome.

  4. 2 weeks ago
    Anonymous

    Put all the 6000 elements in a map
    For every line in the 2000 line, see if the element is already there
    Foe bigger sets I dont really know

  5. 2 weeks ago
    Anonymous

    >not coding in pregabalin
    Ngmi

Your email address will not be published. Required fields are marked *