Kendi kendine yapılan görüntü

Belirteçleştirme:

import nltk
from nltk.tokenize import word_tokenize

text = "This is an example of tokenization."
tokens = word_tokenize(text)
print(tokens)
# Output: ('This', 'is', 'an', 'example', 'of', 'tokenization', '.')

# If you get error in above code add below code after import nltk
nltk.download('punkt')
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

Lemmatizasyon:

import nltk
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

print(lemmatizer.lemmatize("running"))
# Output: 'running'
print(lemmatizer.lemmatize("ran"))
# Output: 'run'

buharlama:

import nltk
from nltk.stem import PorterStemmer

stemmer = PorterStemmer()
print(stemmer.stem('running'))
# Output: 'run'
print(stemmer.stem('runner'))
# Output: 'runner'

Konuşma Parçası Etiketleme:

import nltk
from nltk import pos_tag
from nltk.tokenize import word_tokenize
text = "I am learning NLP techniques in Python."
tokens = word_tokenize(text)
pos_tags = pos_tag(tokens)
print(pos_tags)
# Output: (('I', 'PRP'), ('am', 'VBP'), ('learning', 'VBG'), ('NLP', 'NNP'), ('techniques', 'NNS'), ('in', 'IN'), ('Python', 'NNP'), ('.', '.'))

Adlandırılmış Varlık Tanıma:

import nltk
from nltk import ne_chunk
from nltk.tokenize import word_tokenize
text = "Barack Obama was born in Hawaii."
tokens = word_tokenize(text)
tagged_tokens = nltk.pos_tag(tokens)
ner_tree = ne_chunk(tagged_tokens)
print(ner_tree)
# Output: (S (PERSON Barack))

Duygu Analizi:

import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
text = "I love this product! It's amazing."
sia = SentimentIntensityAnalyzer()
score = sia.polarity_scores(text)
print(score)
# Output: {'neg': 0.0, 'neu': 0.192, 'pos': 0.808, 'compound': 0.6369}

Metin Sınıflandırması:

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
# Create a dataset
data = {'text': ('This is a positive text.', 'This is a negative text.'), 'label': ('positive', 'negative')}
df = pd.DataFrame(data)
# Create a CountVectorizer object
vectorizer = CountVectorizer()
# Transform the text column
X = vectorizer.fit_transform(df('text'))
# Create a MultinomialNB object
clf = MultinomialNB()
# Fit the model
clf.fit(X, df('label'))
# Test the model
text = "This is a neutral text."
X_test = vectorizer.transform((text))
pred = clf.predict(X_test)
print(pred)
# Output: ('positive')

Dil çevirisi:

from googletrans import Translator
translator = Translator()
text = "I am learning NLP techniques in Python."
translated_text = translator.translate(text, dest='fr').text
print(translated_text)
# Output: "Je apprends des techniques NLP en Python."

Metin Özetleme:

from gensim.summarization import summarize
text = "Text summarization is the process of condensing a piece of text to its main points. The goal of summarization is to create a condensed version that retains the most important information from the original text. There are several methods for summarization including extraction-based methods and abstraction-based methods. Extraction-based methods select a subset of the words from the original text, while abstraction-based methods generate a new summary by using a model trained on the original text."
summary = summarize(text)
print(summary)
# Output: "There are several methods for summarization including extraction-based methods and abstraction-based methods. Extraction-based methods select a subset of the words from the original text

Kelime Gömmeleri (örn. Word2Vec, GloVe):

from gensim.models import Word2Vec
# Define a dataset
sentences = (('This', 'is', 'a', 'positive', 'text'),
('This', 'is', 'a', 'negative', 'text'),
('This', 'is', 'a', 'neutral', 'text'))
# Train the model
model = Word2Vec(sentences, size=100, window=5, min_count=1, workers=4)
# Access the trained model's word vector
word_vector = model.wv('positive')
print(word_vector)
# Output: array((-1.90734863e-03, -1.52587891e-03, 4.57763672e-04, ...), dtype=float32)
from gensim.models import KeyedVectors
# Load the model
model = KeyedVectors.load_word2vec_format('path/to/glove.6B.100d.txt', binary=False)
# Access the word vector
word_vector = model('word')
print(word_vector)

Bağımlılık Ayrıştırma:

import spacy
# Load the model
nlp = spacy.load("en_core_web_sm")
# Define a sentence
sentence = "I am learning NLP techniques in Python."
# Apply dependency parsing
doc = nlp(sentence)
for token in doc:
print(token.text, token.dep_)
# Output:
# I nsubj
# am ROOT
# learning acomp
# NLP compound
# techniques dobj
# in prep
# Python pobj

konu modelleme

from gensim.corpora import Dictionary
from gensim.models import LdaModel
# Example text corpus
texts = (("cat", "dog", "rat", "elephant"),
("cat", "dog", "rat", "mouse"),
("dog", "rat", "mouse"))
# Create a dictionary from the texts
dictionary = Dictionary(texts)
# Create a Bag-of-Words (BoW) representation of the texts
corpus = (dictionary.doc2bow(text) for text in texts)
# Train an LDA model on the corpus
lda = LdaModel(corpus, num_topics=2, id2word=dictionary)
# Print the topics
for topic_id, topic in lda.print_topics():
print("Topic:", topic_id+1)
print(topic)

Terim sıklığı

from collections import Counter
# Example document
document = "This is an example document. It contains several words, such as 'example' and 'document'."
# Tokenize the document
tokens = document.split()
# Count the frequency of each token
tf = Counter(tokens)
# Print the term frequency
print(tf)



Kaynak : Topic Modelling on Medium

BİR CEVAP BIRAK

Please enter your comment!
Please enter your name here