Loops in python

Posted on May 5, 2024 by Anon

What do you do if you have to do loops in python?
For example: i have a file with 2000 lines, and a DB of 6000 lines
For each line in the file i have to check if any of the 6000 elements is in the line.

I currently have a solution in c# and java but im curious how would you go about it in python since the loops are very slow.

The Kind of Tired That Sleep Won’t Fix Shirt $21.68

Tip Your Landlord Shirt $21.68

The Kind of Tired That Sleep Won’t Fix Shirt $21.68

2 weeks ago

Reply

Anonymous

>Python
Please, leave ...
... just be gone
2 weeks ago

Reply

Anonymous

One loop to create an array of 2000 elements
One database query to check if a row exists with any such element
Maybe have to chunk it a bit in case 2000 is too many for an IN() clause.
- 2 weeks ago
  
  Reply
  
  Anonymous
  
  And besides, why in the world would you loop through records and execute a query per record in a loop? Are you insane?
  
  i dont get what you mean?
  
  The problem is simple.
  You have a file and a DB of names or just "strings" that you have to find if any of that is in each line. Is my approach moronic? 2 for loops and then a loop over the string
  
  >Python
  Please, leave ...
  ... just be gone
  
  it's what the industry uses homosexual
2 weeks ago

Reply

Anonymous

And besides, why in the world would you loop through records and execute a query per record in a loop? Are you insane?
- 2 weeks ago
  
  Reply
  
  Anonymous
  
  One loop to create an array of 2000 elements
  One database query to check if a row exists with any such element
  Maybe have to chunk it a bit in case 2000 is too many for an IN() clause.
  
  def check_substrings(file_path, db_path):
  # Read lines from db.txt
  with open(db_path, 'r') as db_file:
  db_lines = db_file.readlines()
  
  # Read lines from file.txt
  with open(file_path, 'r') as file:
  file_lines = file.readlines()
  
  # Check if any line from db.txt is a substring of any line in file.txt
  for db_line_num, db_line in enumerate(db_lines, 1):
  for file_line_num, file_line in enumerate(file_lines, 1):
  if db_line.strip().upper() in file_line.upper():
  print(f"Line {db_line_num} in {db_path} matches a substring in line {file_line_num} in {file_path}")
  
  generated by chatgpt, i'm asking if this is too slow and if it can be made faster
  - 2 weeks ago
    
    Reply
    
    Anonymous
    
    no it's perfect
    in fact, ChatGPT is so good you should use it instead of killing a thread for this shit moronic homosexual
    - 2 weeks ago
      
      Reply
      
      Anonymous
      
      In Python, if you're using a for loop it almost always means you're doing something wrong. In this case there is a trivial solution that will also be orders of magnitude faster: convert both lists to a set, then take the intersection.
      
      Put all the 6000 elements in a map
      For every line in the 2000 line, see if the element is already there
      Foe bigger sets I dont really know
      
      I imagine the fastest solution that could reasonably implemented while preserving the features of this current program (such as keeping track of line numbers) would be to convert file_line to a set, then use a list comprehension on db_line. Something like this:
      
      file_set = file_line.set()
      [ print("some_bullshit") for (line_num, line) in enumerate(db_line, 1) if line in file_set ]
      
      Haven't tested it so you might need to make some syntax corrections, my Python is a little rusty. But I really doubt you'll find a faster solution than this while keeping things so simple.
      
      Benefits:
      - sets offer average lookup time of O(1)
      - list comprehensions use optimised C code so they're much faster than for loops
      - entire algorithm is O(n), compared to the O(n^2) algorithm you posted
      
      You're welcome.
      
      class TrieNode:
      def __init__(self):
      self.children = {}
      self.is_end_of_word = False
      
      class Trie:
      def __init__(self):
      self.root = TrieNode()
      
      def insert(self, word):
      node = self.root
      for char in word:
      if char not in node.children:
      node.children[char] = TrieNode()
      node = node.children[char]
      node.is_end_of_word = True
      
      def search(self, word):
      node = self.root
      matched_string = ""
      for i, char in enumerate(word):
      if char not in node.children:
      break
      node = node.children[char]
      matched_string += char
      if node.is_end_of_word:
      return matched_string
      return None
      
      def build_trie(db_lines):
      trie = Trie()
      for line in db_lines:
      trie.insert(line)
      return trie
      
      def check_substrings(file_path, db_path):
      # Read lines from db.txt and convert to a list with .strip().upper()
      with open(db_path, 'r') as db_file:
      db_lines = [line.strip().upper() for line in db_file]
      
      # Build a trie from db lines
      trie = build_trie(db_lines)
      
      # Read lines from file.txt
      with open(file_path, 'r') as file:
      for file_line_num, file_line in enumerate(file, 1):
      file_line = file_line.strip().upper()
      current_index = 0
      while current_index < len(file_line):
      matched_string = trie.search(file_line[current_index:])
      if matched_string:
      print(f"Found a match '{matched_string}' in line {file_line_num}")
      break
      current_index += 1
      
      this is the code that chatgpt came up with and it dropped the runtime from 8 seconds to 1 second
      - 2 weeks ago
        
        Anonymous
        
        Test my solution here:
        
        I imagine the fastest solution that could reasonably implemented while preserving the features of this current program (such as keeping track of line numbers) would be to convert file_line to a set, then use a list comprehension on db_line. Something like this:
        
        file_set = file_line.set()
        [ print("some_bullshit") for (line_num, line) in enumerate(db_line, 1) if line in file_set ]
        
        Haven't tested it so you might need to make some syntax corrections, my Python is a little rusty. But I really doubt you'll find a faster solution than this while keeping things so simple.
        
        Benefits:
        - sets offer average lookup time of O(1)
        - list comprehensions use optimised C code so they're much faster than for loops
        - entire algorithm is O(n), compared to the O(n^2) algorithm you posted
        
        You're welcome.
        
        then let me know the runtime.
        
        And for god's sake, stop polluting IQfy with your AI slop.
      - 2 weeks ago
        
        Anonymous
        
        for the extreme case, 22 seconds to 10seconds with your thing
        For smaller cases, difference of 0.04(trie) 0.12(yours)
        
        def check_substrings(file_path, db_path):
        # Read lines from db.txt
        with open(db_path, 'r') as db_file:
        db_lines = [line.strip().upper() for line in db_file]
        
        # Read lines from file.txt and convert to a set of uppercase lines
        with open(file_path, 'r') as file:
        file_lines_set = {line.strip().upper() for line in file}
        
        # Check if any line from db.txt is a substring of any line in file.txt
        matches = [(db_line_num, db_line, file_line_num)
        for db_line_num, db_line in enumerate(db_lines, 1)
        for file_line_num, file_line in enumerate(file_lines_set, 1)
        if db_line in file_line]
        
        # Print matches
        for db_line_num, db_line, file_line_num in matches:
        print(f"Line {db_line_num} in {db_path} matches a substring in line {file_line_num} {db_lines[db_line_num-1]}")
      - 2 weeks ago
        
        Anonymous
        
        You did it wrong. You must use the "in" operation on the SET, not the LIST. Sets have O(1) lookup time, lists have O(n). Fricking dummy.
        
        I'm leaving this thread, have fun with your AI slop.
      - 2 weeks ago
        
        Anonymous
        
        # Read lines from db.txt
        with open("db.txt", 'r') as db_file:
        db_lines = db_file.readlines()
        
        # Read lines from file.txt
        with open("file.txt", 'r') as file:
        file_lines = file.readlines()
        
        file_set = set(file_lines)
        [ print(f"{line_num}: {line}") for line_num, line in enumerate(db_lines, 1) if line in file_set ]
        
        Here you go Rajesh, since you're clearly a complete fricking moron I did all the work for you. Just run this script and time it.
      - 2 weeks ago
        
        Anonymous
        
        still doesnt work pajeet, it doesnt match anything
        i dont mean to be toxic but the version generated worked meanwhile yours doesnt
      - 2 weeks ago
        
        Anonymous
        
        I just tested my version and it does work, I tested it on
        for (( i = 0; i < 5000; i++ )); do echo "$(head -c 300 /dev/urandom | tr -dc 'a-z0-9 ')" >> db.txt; done
        for (( i = 0; i < 5000; i++ )); do echo "$(head -c 300 /dev/urandom | tr -dc 'a-z0-9 ')" >> file.txt; done
        cat file.txt >> db.txt
        shuf -o db.txt db.txt
        time python3 script.py
        
        real 0m0.110s
        user 0m0.070s
        sys 0m0.041s
        
        Meanwhile your first solution took 18 seconds, so mine is about 180x faster. I don't know how you managed to break my code when I literally gave you a complete script to run, but you clearly have a talent for being stupid.
      - 2 weeks ago
        
        Anonymous
        
        [ print(f"{line_num}: {line}") for line_num, line in enumerate(db_lines, 1) if line in file_set ]
        
        this checks if "X" in file_set, if my file_set contains "homosexuals X homosexuals" it will not match, meanwhile this
        matches = [(db_line_num, db_line, file_line_num)
        for db_line_num, db_line in enumerate(db_lines, 1)
        for file_line_num, file_line in enumerate(file_lines_set, 1)
        if db_line in file_line]
        
        does, why you have to be so toxic if you didnt understand what i asked
      - 2 weeks ago
        
        Anonymous
        
        I don't even understand what the frick you're trying to say. You are literally a fricking Pajeet.
      - 2 weeks ago
        
        Anonymous
        
        file_set1 --> "xxxhomosexualsxxx"
        file_set2 --> "homosexual"
        
        your code will match file_set2 but not file_set1 give the item "homosexual", what don't you understand? and stop calling me named
      - 2 weeks ago
        
        Anonymous
        
        sorry, from 22 seconds with the first implementation to 1 second with the second one
  - 2 weeks ago
    
    Reply
    
    Anonymous
    
    for a file with 10k lines and a db with 2.4k lines
    it takes 7 seconds to do it with the code i provided
    
    no it's perfect
    in fact, chatgpt is so good you should use it instead of killing a thread for this shit moronic homosexual
    
    it's a sunny day, why are you seething?
  - 2 weeks ago
    
    Reply
    
    Anonymous
    
    In Python, if you're using a for loop it almost always means you're doing something wrong. In this case there is a trivial solution that will also be orders of magnitude faster: convert both lists to a set, then take the intersection.
  - 2 weeks ago
    
    Reply
    
    Anonymous
    
    I imagine the fastest solution that could reasonably implemented while preserving the features of this current program (such as keeping track of line numbers) would be to convert file_line to a set, then use a list comprehension on db_line. Something like this:
    
    file_set = file_line.set()
    [ print("some_bullshit") for (line_num, line) in enumerate(db_line, 1) if line in file_set ]
    
    Haven't tested it so you might need to make some syntax corrections, my Python is a little rusty. But I really doubt you'll find a faster solution than this while keeping things so simple.
    
    Benefits:
    - sets offer average lookup time of O(1)
    - list comprehensions use optimised C code so they're much faster than for loops
    - entire algorithm is O(n), compared to the O(n^2) algorithm you posted
    
    You're welcome.
2 weeks ago

Reply

Anonymous

Put all the 6000 elements in a map
For every line in the 2000 line, see if the element is already there
Foe bigger sets I dont really know
2 weeks ago

Reply

Anonymous

>not coding in pregabalin
Ngmi

Cancel reply